Hierarchical Clustering Of Verbs

نویسندگان

  • Roberto Basili
  • Maria Teresa Pazienza
  • Paola Velardi
چکیده

In this paper we present an unsupervised learning algorithm for incremental concept formation, based on an augmented version of COBWEB. The algorithm is applied to the task of acquiring a verb taxonomy through the systematic observation of verb usages in corpora. Using a Machine Learning methodology for a Natural language problem required adjustments on both sides. In fact, concept formation algorithms assume the input information as being stable, unambiguous and complete. At the opposite, linguistic data are ambiguous, incomplete, and possibly erroneous. A NL processor is used to extract semiautomatically from corpora the thematic roles of verbs and derive a feature-vector representation of verb instances. In order to account for multiple instances of the same verb, the measure of category utility, defined in COBWEB, has been augmented with the notion of memory inertia. Memory inertia models the influence that previously classified instances of a given verb have on the classification of subsequent instances of the same verb. Finally, a method is defined to identify the basic-level classes of an acquired hierarchy, i.e. those bringing the most predictive information about their members.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

‘Over reference’: a comparative study on German prefix-verbs

• Experiment: Hierarchical clustering of 4 × 10 prefix-verbs on über (over). We extracted vector representations for all items in our dataset (derived and simple verbs) by relying on a state-of-the-art technique (cf. Mikolov et al. [2013] continuous bag-ofwords representation). The distributional semantic model on which our experiment was conducted was extracted from the SdeWac corpus (cf. Faaß...

متن کامل

Evaluating Hierarchies of Verb Argument Structure with Hierarchical Clustering

Verbs can only be used with a few specific arrangements of their arguments (syntactic frames). Most theorists note that verbs can be organized into a hierarchy of verb classes based on the frames they admit. Here we show that such a hierarchy is objectively well-supported by the patterns of verbs and frames in English, since a systematic hierarchical clustering algorithm converges on the same s...

متن کامل

Towards a Semantic Classification of Spanish Verbs Based on Subcategorisation Information

We present experiments aiming at an automatic classification of Spanish verbs into lexical semantic classes. We apply well-known techniques that have been developed for the English language to Spanish, proving that empirical methods can be re-used through languages without substantial changes in the methodology. Our results on subcategorisation acquisition compare favourably to the state of the...

متن کامل

Identifying Metaphor Hierarchies in a Corpus Analysis of Finance Articles

Using a corpus of over 17,000 financial news reports (involving over 10M words), we perform an analysis of the argument-distributions of the UPand DOWN-verbs used to describe movements of indices, stocks, and shares. Using measures of the overlap in the argument distributions of these verbs and k-means clustering of their distributions, we advance evidence for the proposal that the metaphors re...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Feature Extraction of Concepts by Independent Component Analysis

Semantic clustering is important to various fields in the modern information society. In this work we applied the Independent Component Analysis method to the extraction of the features of latent concepts. We used verb and object noun information and formulated a concept as a linear combination of verbs. The proposed method is shown to be suitable for our framework and it performs better than a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993